139 research outputs found
NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition
BACKGROUND: Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing. RESULTS: To develop our ML-based Bio-NER system, we employ conditional random fields, which have performed effectively in several well-known tasks, as our underlying ML model. Adding selected conjunction features, applying numerical normalization, and employing pattern-based post-processing improve the F-scores by 1.67%, 1.04%, and 0.57%, respectively. The combined increase of 3.28% yields a total score of 72.98%, which is better than the baseline system that only uses singleton features. CONCLUSION: We demonstrate the benefits of using the sequential forward search algorithm to select effective conjunction feature groups. In addition, we show that numerical normalization can effectively reduce the number of redundant and unseen features. Furthermore, the Smith-Waterman local alignment algorithm can help ML-based Bio-NER deal with difficult cases that need longer context windows
Distributed Training Large-Scale Deep Architectures
Scale of data and scale of computation infrastructures together enable the
current deep learning renaissance. However, training large-scale deep
architectures demands both algorithmic improvement and careful system
configuration. In this paper, we focus on employing the system approach to
speed up large-scale training. Via lessons learned from our routine
benchmarking effort, we first identify bottlenecks and overheads that hinter
data parallelism. We then devise guidelines that help practitioners to
configure an effective system and fine-tune parameters to achieve desired
speedup. Specifically, we develop a procedure for setting minibatch size and
choosing computation algorithms. We also derive lemmas for determining the
quantity of key components such as the number of GPUs and parameter servers.
Experiments and examples show that these guidelines help effectively speed up
large-scale deep learning training
Effectiveness of influenza vaccination in patients with end-stage renal disease receiving hemodialysis: a population-based study.
BackgroundLittle is known on the effectiveness of influenza vaccine in ESRD patients. This study compared the incidence of hospitalization, morbidity, and mortality in end-stage renal disease (ESRD) patients undergoing hemodialysis (HD) between cohorts with and without influenza vaccination.MethodsWe used the insurance claims data from 1998 to 2009 in Taiwan to determine the incidence of these events within one year after influenza vaccination in the vaccine (N = 831) and the non-vaccine (N = 3187) cohorts. The vaccine cohort to the non-vaccine cohort incidence rate ratio and hazard ratio (HR) of morbidities and mortality were measured.ResultsThe age-specific analysis showed that the elderly in the vaccine cohort had lower hospitalization rate (100.8 vs. 133.9 per 100 person-years), contributing to an overall HR of 0.81 (95% confidence interval (CI) 0.72-0.90). The vaccine cohort also had an adjusted HR of 0.85 [95% CI 0.75-0.96] for heart disease. The corresponding incidence of pneumonia and influenza was 22.4 versus 17.2 per 100 person-years, but with an adjusted HR of 0.80 (95% CI 0.64-1.02). The vaccine cohort had lowered risks than the non-vaccine cohort for intensive care unit (ICU) admission (adjusted HR 0.20, 95% CI 0.12-0.33) and mortality (adjusted HR 0.50, 95% CI 0.41-0.60). The time-dependent Cox model revealed an overall adjusted HR for mortality of 0.30 (95% CI 0.26-0.35) after counting vaccination for multi-years.ConclusionsESRD patients with HD receiving the influenza vaccination could have reduced risks of pneumonia/influenza and other morbidities, ICU stay, hospitalization and death, particularly for the elderly
Recommended from our members
BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
Background: Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical field. A key IE task in this field is the extraction of biomedical relations, such as protein-protein and gene-disease interactions. However, most biomedical relation extraction systems usually ignore adverbial and prepositional phrases and words identifying location, manner, timing, and condition, which are essential for describing biomedical relations. Semantic role labeling (SRL) is a natural language processing technique that identifies the semantic roles of these words or phrases in sentences and expresses them as predicate-argument structures. We construct a biomedical SRL system called BIOSMILE that uses a maximum entropy (ME) machine-learning model to extract biomedical relations. BIOSMILE is trained on BioProp, our semi-automatic, annotated biomedical proposition bank. Currently, we are focusing on 30 biomedical verbs that are frequently used or considered important for describing molecular events. Results: To evaluate the performance of BIOSMILE, we conducted two experiments to (1) compare the performance of SRL systems trained on newswire and biomedical corpora; and (2) examine the effects of using biomedical-specific features. The experimental results show that using BioProp improves the F-score of the SRL system by 21.45% over an SRL system that uses a newswire corpus. It is noteworthy that adding automatically generated template features improves the overall F-score by a further 0.52%. Specifically, ArgM-LOC, ArgM-MNR, and Arg2 achieve statistically significant performance improvements of 3.33%, 2.27%, and 1.44%, respectively. Conclusion: We demonstrate the necessity of using a biomedical proposition bank for training SRL systems in the biomedical domain. Besides the different characteristics of biomedical and newswire sentences, factors such as cross-domain framesets and verb usage variations also influence the performance of SRL systems. For argument classification, we find that NE (named entity) features indicating if the target node matches with NEs are not effective, since NEs may match with a node of the parsing tree that does not have semantic role labels in the training set. We therefore incorporate templates composed of specific words, NE types, and POS tags into the SRL system. As a result, the classification accuracy for adjunct arguments, which is especially important for biomedical SRL, is improved significantly
Carboxyl-terminal truncated HBx regulates a distinct microRNA transcription program in Hepatocellular carcinoma development
Background: The biological pathways and functional properties by which misexpressed microRNAs (miRNAs) contribute to liver carcinogenesis have been intensively investigated. However, little is known about the upstream mechanisms that deregulate miRNA expressions in this process. In hepatocellular carcinoma (HCC), hepatitis B virus (HBV) X protein (HBx), a transcriptional trans-activator, is frequently expressed in truncated form without carboxyl-terminus but its role in miRNA expression and HCC development is unclear. Methods: Human non-tumorigenic hepatocytes were infected with lentivirus-expressing full-length and carboxyl-terminal truncated HBx (Ct-HBx) for cell growth assay and miRNA profiling. Chromatin immunoprecipitation microarray was performed to identify the miRNA promoters directly associated with HBx. Direct transcriptional control was verified by luciferase reporter assay. The differential miRNA expressions were further validated in a cohort of HBV-associated HCC tissues using real-time PCR. Results: Hepatocytes expressing Ct-HBx grew significantly faster than the full-length HBx counterparts. Ct-HBx decreased while full-length HBx increased the expression of a set of miRNAs with growth-suppressive functions. Interestingly, Ct-HBx bound to and inhibited the transcriptional activity of some of these miRNA promoters. Notably, some of the examined repressed-miRNAs (miR-26a, -29c, -146a and -190) were also significantly down-regulated in a subset of HCC tissues with carboxyl-terminal HBx truncation compared to their matching non-tumor tissues, highlighting the clinical relevance of our data. Conclusion: Our results suggest that Ct-HBx directly regulates miRNA transcription and in turn promotes hepatocellular proliferation, thus revealing a viral contribution of miRNA deregulation during hepatocarcinogenesis. © 2011 Yip et al.published_or_final_versio
Retrospective comparison between a regular and a split-dose protocol of 5-fluorouracil, cisplatin, and mitoxantrone for the treatment of far advanced hepatocellular carcinoma
<p>Abstract</p> <p>Background</p> <p>In patients with advanced hepatocellular carcinoma (HCC), combination chemotherapy using 5- fluorouracil, cisplatin, and mitoxantrone (FMP) could achieve a response rate > 20%, but the beneficial effect was compromised by formidable adverse events. Chemotherapy given in a split-dose manner was associated with reduced toxicities. In this retrospective study, we compared the efficacies and side effects between a regular and a split-dose FMP protocol approved in our medical center.</p> <p>Methods</p> <p>From 2005 to 2008, the clinical data of 84 patients with far advanced HCC, who had either main portal vein thrombosis and/or extrahepatic metastasis, were reviewed. Of them, 65 were treated by either regular (n = 27) or split-dose (n = 38) FMP and had completed at least one therapeutic course. The remaining 19 patients were untreated. Clinical parameters, therapeutic responses, survivals and adverse events were compared.</p> <p>Results</p> <p>The median overall survival was 6.0, 5.2, and 1.5 months, respectively, in patients receiving regular FMP, split-dose FMP, and no treatment (regular versus split-dose group, P = 0.447; regular or split-dose versus untreated group; P < 0.0001). Patients receiving split-dose treatment had a significantly lower risk of grade 3/4 neutropenia (51.9 versus 10.5%, P = 0.0005). When the two treated groups were combined, the median overall survival was 10.6 and 3.8 months respectively for patients achieving disease control and progressive disease (P < 0.001). Cox proportion hazard model identified Child-Pugh stage B (hazard ratio [HR], 2.216; P = 0.006), presence of extrahepatic metastasis (HR, 0.574; P = 0.048), and achievement of disease control (HR, 0.228; P < 0.001) as independent factors associated with overall survival. Logistic regression analysis revealed that anti-hepatitis C virus antibody (odds ratio [OR], 9.219; P = 0.002) tumor size (OR, 0.816; P = 0.036), and previous anti-cancer therapy (OR, 0.195; P = 0.017) were significantly associated with successful disease control.</p> <p>Conclusions</p> <p>Comparable overall survival was observed between patients receiving regular and split-dose FMP therapies. Patients receiving split-dose therapy had a significantly lower risk of grade 3/4 neutropenia. Positive anti-hepatitis C virus antibody, smaller tumor size, and absence of previous anti-cancer therapy were independent predictors for successful disease control.</p
Recommended from our members
Brain computerized tomography reading in suspected acute ischemic stroke patients: what are essentials for medical students?
Background
Few systematic methods prioritize the image education in medical students (MS). We hope to develop a checklist of brain computerized tomography (CT) reading in patients with suspected acute ischemic stroke (AIS) for MS and primary care (PC) physicians.
Methods
Our pilot group generated the items indicating specific structures or signs for the checklist of brain CT reading in suspected AIS patients for MS and PC physicians. These items were used in a modified web-based Delphi process using the online software “SurveyMonkey”. In total 15 panelists including neurologists, neurosurgeons, neuroradiologists, and emergency department physicians participated in the modified Delphi process. Each panelist was encouraged to express feedback, agreement or disagreement on the inclusion of each item using a 9-point Likert scale. Items with median scores of 7–9 were included in our final checklist.
Results
Fifty-two items were initially provided for the first round of the Delphi process. Of these, 35 achieved general agreement of being an essential item for the MS and PC physicians. The other 17 of the 52 items in this round and another two added items suggested by the panelists were further rated in the next round. Finally, 38 items were included in the essential checklist items of brain CT reading in suspected AIS patients for MS and PC physicians.
Conclusions
We established a reference regarding the essential items of brain CT reading in suspected AIS patients. We hope this helps to minimize malpractice and a delayed diagnosis, and to improve competency-based medical education for MS and PC physicians
- …